AITopics | prediction task

Digital personas powered by Large Language Models (LLMs) are increasingly proposed as substitutes for human survey respondents, yet it remains unclear when they can reliably approximate human survey findings. We answer this question using the LISS panel, constructing personas from respondents' background variables and pre-2023 survey histories, then testing them against the same respondents' held-out post-cutoff answers. Across four persona architectures, three LLMs, and two prediction tasks, we assess performance at the question, respondent, distributional, equity, and clustering levels. Digital personas improve alignment with human response distributions, especially in domains tied to stable attributes and values, but remain limited for individual prediction and fail to recover multivariate respondent structure. Retrieval-augmented architectures provide the clearest gains, but performance depends more on human response structure than on model choice: personas perform best for low-variability questions and common respondent patterns, and worst for subjective, heterogeneous, or rare responses. Our results provide practical guidance on when digital personas could be appropriate for survey research and when human validation remains necessary.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Machine Learning

2605.10659

Country:

North America > United States (0.67)
North America > Canada > Ontario (0.14)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

d60e14c19cd6e0fc38556ad29ac8fbc9-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 22:17:37 GMT

artificial intelligence, computation time, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

Add feedback

8f61049e8fe5b9ed714860b951066f1e-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-29-2026, 00:19:23 GMT

artificial intelligence, machine learning, modality, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.69)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

32e54441e6382a7fbacbbbaf3c450059-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 10:10:28 GMT

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.93)

Industry:

Education > Educational Setting (0.95)
Law (0.93)
Government > Regional Government > North America Government > United States Government (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Retiring Adult: New Datasets for Fair Machine Learning

Neural Information Processing SystemsApr-25-2026, 10:10:23 GMT

Although the fairness community has recognized the importance of data, re-searchers in the area primarily rely on UCIAdult when it comes to tabular data. Derived from a 1994 USCensus survey, this dataset has appeared in hundreds of research papers where it served as the basis for the development and comparison of many algorithmic fairness interventions. We reconstruct a superset of the UCI Adult data from available USCensus sources and reveal idiosyncrasies of the UCIAdult dataset that limit its external validity. Our primary contribution is asuite of new datasets derived from USCensus surveys that extend the existing data ecosystem for research on fair machine learning. We create prediction tasks relating to income, employment, health, transportation, and housing. The data span multiple years and all states of the United States, allowing researchers to studytemporal shift and geographic variation. We highlight a broad initial sweep of new empirical insights relating to trade-offs between fairness criteria, performance of algorithmic interventions, and the role of distribution shift based on our new datasets. Our findings inform ongoing debates, challenge some existing narratives, and point to future research directions.

artificial intelligence, dataset, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Information Technology (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

1af83ab66b4f07a3f55788e67dab5782-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 22:04:06 GMT

machine learning, natural language, rademacher complexity, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

03b2ceb73723f8b53cd533e4fba898ee-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 10:52:56 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Information Management (0.95)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.75)

Add feedback

066b98e63313162f6562b35962671288-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-24-2026, 09:32:25 GMT

data mining, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec (0.28)

Industry:

Information Technology (0.93)
Health & Medicine (0.68)
Transportation > Air (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.68)
(2 more...)

Add feedback

Aligning Validation with Deployment: Target-Weighted Cross-Validation for Spatial Prediction

Brenning, Alexander, Suesse, Thomas

arXiv.org Machine LearningApr-1-2026

Cross-validation (CV) is commonly used to estimate predictive risk when independent test data are unavailable. Its validity depends on the assumption that validation tasks are sampled from the same distribution as prediction tasks encountered during deployment. In spatial prediction and other settings with structured data, this assumption is frequently violated, leading to biased estimates of deployment risk. We propose Target-Weighted CV (TWCV), an estimator of deployment risk that accounts for discrepancies between validation and deployment task distributions, thus accounting for (1) covariate shift and (2) task-difficulty shift. We characterize prediction tasks by descriptors such as covariates and spatial configuration. TWCV assigns weights to validation losses such that the weighted empirical distribution of validation tasks matches the corresponding distribution over a target domain. The weights are obtained via calibration weighting, yielding an importance-weighted estimator that targets deployment risk. Since TWCV requires adequate coverage of the deployment distribution's support, we combine it with spatially buffered resampling that diversifies the task difficulty distribution. In a simulation study, conventional as well as spatial estimators exhibit substantial bias depending on sampling, whereas buffered TWCV remains approximately unbiased across scenarios. A case study in environmental pollution mapping further confirms that discrepancies between validation and deployment task distributions can affect performance assessment, and that buffered TWCV better reflects the prediction task over the target domain. These results establish task distribution mismatch as a primary source of CV bias in spatial prediction and show that calibration weighting combined with a suitable validation task generator provides a viable approach to estimating predictive risk under dataset shift.

artificial intelligence, machine learning, modeling & simulation, (17 more...)

arXiv.org Machine Learning

2603.29981

Country:

Europe > Germany (0.14)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.66)
Law (0.48)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.62)

Add feedback

SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding

Neural Information Processing SystemsMar-22-2026, 12:44:21 GMT

Accurately identifying and organizing textual content is crucial for the automation of document processing in the field of form understanding. Existing datasets, such as FUNSD and XFUND, support entity classification and relationship prediction tasks but are typically limited to local and entity-level annotations. This limitation overlooks the hierarchically structured representation of documents, constraining comprehensive understanding of complex forms. To address this issue, we present the SRFUND, a hierarchically structured multi-task form understanding benchmark. SRFUND provides refined annotations on top of the original FUNSD and XFUND datasets, encompassing five tasks: (1) word to text-line merging, (2) text-line to entity merging, (3) entity category classification, (4) item table localization, and (5) entity-based full-document hierarchical structure recovery. We meticulously supplemented the original dataset with missing annotations at various levels of granularity and added detailed annotations for multi-item table regions within the forms. Additionally, we introduce global hierarchical structure dependencies for entity relation prediction tasks, surpassing traditional local key-value associations. The SRFUND dataset includes eight languages including English, Chinese, Japanese, German, French, Spanish, Italian, and Portuguese, making it a powerful tool for cross-lingual form understanding. Extensive experimental results demonstrate that the SRFUND dataset presents new challenges and significant opportunities in handling diverse layouts and global hierarchical structures of forms, thus providing deep insights into the field of form understanding.

artificial intelligence, name change, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.61)

Add feedback

Filters

Collaborating Authors

prediction task

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

When Can Digital Personas Reliably Approximate Human Survey Findings?

d60e14c19cd6e0fc38556ad29ac8fbc9-Supplemental-Conference.pdf

8f61049e8fe5b9ed714860b951066f1e-Paper-Datasets_and_Benchmarks.pdf

32e54441e6382a7fbacbbbaf3c450059-Supplemental.pdf

Retiring Adult: New Datasets for Fair Machine Learning

1af83ab66b4f07a3f55788e67dab5782-Paper-Conference.pdf

03b2ceb73723f8b53cd533e4fba898ee-Paper.pdf

066b98e63313162f6562b35962671288-Paper-Datasets_and_Benchmarks.pdf

Aligning Validation with Deployment: Target-Weighted Cross-Validation for Spatial Prediction

SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding